MIRACLE's Approach to Multilingual Web Retrieval
نویسندگان
چکیده
For MIRACLE participation on WebClef 2005, a set of independent indexes was constructed for each top level domain of the EuroGOV collection. Each of these indexes contains information extracted from the document, like URL, title, keywords, detected named entities or HTML headers. These indexes are queried to obtain partial document rankings, which are combined with various relative weights to test the value of each index. The trie based indexing and retrieval engine developed by the MIRACLE team is now fully functional and has been adapted to the WebClef environment and employed in this campaign. Other tools, such as the Named Entities Recognizer based on a finite automaton, have also been developed.
منابع مشابه
MIRACLE's 2005 Approach to Cross-lingual Information Retrieval
This paper presents the 2005 Miracle’s team approach to Bilingual and Multilingual Information Retrieval. In the multilingual track, we have concentrated our work on the merging process of the results of monolingual runs to get the multilingual overall result, relying on available translations. In the bilingual and multilingual tracks, we have used available translation resources, and in some c...
متن کاملSupporting Multilingual Information Retrieval in Web Applications: An English-Chinese Web Portal Experiment
Cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR) techniques have been widely studied, but they are not often applied to and evaluated for Web applications. In this paper, we present our research in developing and evaluating a multilingual English-Chinese Web portal in the business domain. A dictionary-based approach has been adopted that combines phrasal...
متن کاملA multilingual text mining approach to web cross-lingual text retrieval
To enable concept-based cross-lingual text retrieval (CLTR) using multilingual text mining, our approach will first discover the multilingual concept–term relationships from linguistically diverse textual data relevant to a domain. Second, the multilingual concept–term relationships, in turn, are used to discover the conceptual content of the multilingual text, which is either a document contai...
متن کاملMIRACLE's 2005 Approach to Geographical Information Retrieval
This paper presents the 2005 MIRACLE’s team approach to Cross-Language Geographical Retrieval (GeoCLEF). The main goal of the GeoCLEF participation of the MIRACLE team was to test the effect that geographical information retrieval techniques cause to information retrieval. The baseline approach is based on the development of named entity recognition and geospatial information retrieval tools an...
متن کاملExploiting the Web as the multilingual corpus for unknown query translation
Users’ cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. We propose a Web-based term transla...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005